Cost-Minimizing Online Algorithms for Geo-Distributed Data Analytics
نویسندگان
چکیده
منابع مشابه
Bohr: Similarity Aware Geo-distributed Data Analytics
We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggr...
متن کاملWANalytics: Analytics for a Geo-Distributed Data-Intensive World
Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the best of our knowledge, WABD is not supp...
متن کاملLow Latency Geo-distributed Data Analytics – Public Review
Large cloud service providers ingest massive amounts of data in geographically distributed sites spread across the globe. Analytics for such planetary-scale datasets is an important emerging challenge. The current practice is to copy all data to a central location, where it can be dealt with locally by standard data analytics stacks such as Hadoop and Spark. However, transferring large volumes ...
متن کاملOnline Migration for Geo-distributed Storage Systems
We consider the problem of migrating user data between data centers. We introduce distributed storage overlays, a simple abstraction that represents data as stacked layers in different places. Overlays can be readily used to cache data objects, migrate these caches, and migrate the home of data objects. We implement overlays as part of a key-value object store called Nomad, designed to span man...
متن کاملPingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System
Geo-distributed data analysis in a cloud-edge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edge clusters satisfying data locality. However, execution in the edge clusters is less well, due to limited resource, overload interference and cluster-level unreachable troubles, which obstructs the guarantee on the speed and comple...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2019
ISSN: 2169-3536
DOI: 10.1109/access.2019.2951682